Leading Professional Society for Computational Biology and Bioinformatics
Connecting, Training, Empowering, Worldwide

banner

Posters

Poster presentations at ISMB 2020 will be presented virtually. Authors will pre-record their poster talk (5-7 minutes) and will upload it to the virtual conference platform site along with a PDF of their poster. All registered conference participants will have access to the poster and presentation through the conference and content until October 31, 2020. There are Q&A opportunities through a chat function to allow interaction between presenters and participants.

Preliminary information on preparing your poster and poster talk are available at: https://www.iscb.org/ismb2020-general/presenterinfo#posters

Ideally authors should be available for interactive chat during the times noted below:

View Posters By Category

Poster Session A: July 13 & July 14 7:45 am - 9:15 am Eastern Daylight Time
Session B: July 15 and July 16 between 7:45 am - 9:15 am Eastern Daylight Time
July 14 between 10:40 am - 2:00 pm EDT
A Data-engineering Pipeline for Deep Learning in Structural Bioinformatics
COSI: 3DSIG COSI
  • Eli Draizen, University of Virginia, United States
  • Cameron Mura, University of Virginia, United States
  • Philip Bourne, University of Virginia, United States

Short Abstract: Machine learning has a rich history in structural bioinformatics, and modern approaches such as deep learning are already revolutionizing our knowledge of the subtle relationships between biomolecular sequence, structure, function & evolution. The advances—indeed, any progress relying on statistical learning approaches (biology or beyond)—are enabled by large volumes of data that are also intelligible or manipulable (parseable, machine-readable, etc.). In structural bioinformatics, such data often relate, directly or indirectly, to protein 3D structures. A significant (and often recurring) challenge concerns the creation of large, high-quality, openly accessible datasets that can be used for specific training/benchmarking tasks, e.g. in predictive modeling projects (predicting 3D structures, protein interactions, etc.). Here, we report a protein biophysical and evolutionary featurization and data-processing pipeline that we recently developed and deployed (both in the cloud and on local HPC resources) in order to systematically and reproducibly create comprehensive, superfamily-level domain databases for deep learning tasks (e.g., for structure classification and predicting domain interactions). While motivated by specific problems, we believe this robust computational pipeline could be of broader utility for other structure-related workflows (i.e., as a community-wide resource), particularly as arise at the intersection of deep learning and structural bioinformatics.

A graph theory approach to detect the clustering of functional annotations in the three-dimensional structure of genomes
COSI: 3DSIG COSI
  • Dallas Nygard, University of Ottawa, Canada
  • Julie St-Pierre, University of Ottawa, Canada
  • Mathieu Lavallée-Adam, University of Ottawa, Canada

Short Abstract: Chromatin Conformation Capture (3C) technologies have allowed the exploration of chromatin
spatial structure and the determination of chromatin regions that are in close proximity in the
nucleus. Evidence gathered using these technologies suggests that genes involved in similar
biological processes tend to cluster together in chromatin regions for regulation purposes, but
the extent of this form of functional chromatin arrangement is largely uncharacterized. In order
to quantify the proportion to which genes participating in the same biological function or
metabolic pathway cluster in three-dimensional (3D) nuclear space, we propose a novel graph
theory-based algorithm that incorporates spatial information from publicly available Hi-C
datasets from ENCODE and functional annotations from the Gene Ontology, REACTOME, and
KEGG databases. Our tool uses a Monte Carlo sampling approach to assess the clustering
significance of genes sharing the same functional annotation in chromatin 3D structure. Using
our software, a spatial chromatin network has been created for the LNCaP clone FGC cell line,
and the package has detected Gene Ontology terms for which genes are clustered more tightly
in 3D space than what is expected by chance. Finally, our algorithm provides a better
understanding of the functional implications of chromatin folding and higher order chromatin
structures.

A large-scale structural and evolutionary analysis of protein loop regions
COSI: 3DSIG COSI
  • Lin Zhang, Tohoku University, Japan
  • Hafumi Nishi, Tohoku University, Japan

Short Abstract: Protein loop regions are often involved in forming binding sites and enzyme active sites. However, general structural properties and biological characteristics of protein loops remain unclear. Here we study the structural and evolutionary aspects of protein loops using CATH domain classification. We prepared three loop datasets from the whole PDB (152,933), human (46,290), and E.coli (9,677) proteins. The numbers of loop regions were 734,591, 181,016, and 57,467, respectively. We found that the human loop dataset showed the abundance of Lysine, Serine, and Cysteine compared to the E.coli set. The CATH classification analysis revealed that the terminal residues of a given loop were possible to be classified into diverse superfamilies. This type of loops displayed higher percentages of Leucine, Proline, and Valine. Additionally, even though partial loops from different proteins shared the 100% sequence identity, they were still able to have different superfamily compositions of terminal residues among the proteins. Further studies suggested that the first hierarchy of CATH classification mainly contributed to the heterogeneity. Approximately 1.8% of unique loop sequences of human proteins were also found in the E.coli dataset. The conserved loops of human proteins were increased to 7.3% before removing redundancy, indicating that they tended to be repetitive.

A multiresolution optimization strategy for inferring 3D genome architecture from Hi-C data
COSI: 3DSIG COSI
  • Alexandra Gesine Cauer, University of Washington, United States
  • Galip Yardimci, University of Washington, United States
  • Jean-Philippe Vert, Google, France
  • Nelle Varoquaux, University of California, Berkeley, France
  • Alexandra Cauer, University of Washington, United States

Short Abstract: The three-dimensional organization of the genome plays an important part in regulating numerous basic cellular functions, including gene regulation, differentiation, the cell cycle, DNA replication, and DNA repair. Assays like Hi-C measure DNA-DNA contacts in a high-throughput fashion, and inferring accurate 3D models of chromosomes can yield insights hidden in the raw data. However, inference on low-coverage or high-resolution data is challenging, as is inference of diploid genomes. Previous haploid structural inference methods have successfully addressed the difficulties presented by low-coverage or high-resolution data via multiscale optimization, an optimization strategy that solves a large optimization problem by building upon the solutions to smaller versions of the problem. Because many organisms of interest are diploid, we sought to develop a multiscale optimization approach that infers the structure of diploid genomes. We use simulations to show that integrating multiscale optimization with a previously published diploid inference method dramatically facilitates convergence and improves the accuracy of inferred structures.

ARIAWeb: a new web service for automated NMR structure calculation with ARIA
COSI: 3DSIG COSI
  • Fabien Mareuil, Institut Pasteur, France
  • Fabrice Allain, Institut Curie, France
  • Michael Nilges, Insitut Pasteur, France
  • Benjamin Bardiaux, CNRS - Institut Pasteur, France
  • Hervé Ménager, Institut Pasteur, Paris, France

Short Abstract: Protein structure determination is crucial to understand protein function, protein interactions, and to discover new approaches to control pathological biological processes. Nuclear Magnetic Resonance (NMR) spectroscopy is a method of choice to study the dynamics and the structure of macromolecules. The software ARIA (Ambiguous Restraints for Iterative Assignment) automates treatment of NMR data and calculation of protein structures by molecular dynamics simulation.

Currently, its usage is hindered by the complexity of the installation and execution of the software. To enhance the visibility and usability of the software, we have implemented a new web interface for ARIA, ARIAWeb. This application provides a simple, yet highly flexible interface to configure ARIA calculations, with a Galaxy server that handles the execution of the ARIA jobs.

The web interface guides users through the various configuration steps to progressively set the required input parameters to consistent values. Additionally, results from an ARIA calculation, such as 3D structures and NMR restraints statistics, are displayed with graphical and interactive representations.

ARIAWeb is freely available at ariaweb.pasteur.fr.


F. Allain, F. Mareuil, et al. ARIAweb: a server for automated NMR structure calculation (2020) Nucleic Acids Research (accepted for publication)

BIO-GATS: A tool for automated GPCR template selection through a biophysical approach for homology modeling.
COSI: 3DSIG COSI
  • Amara Jabeen, Macquarie University, Australia
  • Ramya Vijayram, Indian Institute of Technology Madras, Chennai, India
  • Shoba Ranganathan, Macquarie University, Australia

Short Abstract: G Protein coupled receptors (GPCRs) are the largest membrane proteins family comprised of seven transmembrane (TM) domains and more than 800 members. GPCRs are involved in numerous physiological functions within the human body and are the target of more than 30% of the US Food and Drug Administration approved drugs. At present, 64 unique receptors have known experimental structures. The absence of experimental structure of majority GPCRs demands homology models of GPCRs for structure-based drug discovery workflows. Homology model requires appropriate templates. The common methods for template selection considers sequence identity. However, there exist low sequence identity among the TM domains of GPCRs. The sequences with similar pattern of hydrophobic residues are often structural homologues even sharing low sequence identity. We have proposed a novel biophysical approach for template selection based on hydrophobicity correspondence between the target and the template. The approach takes into consideration the other parameters as well including sequence identity, resolution, and query coverage for template selection. The proposed approach has been implemented in the form of graphical user interface. We have applied the approach to an olfactory receptor and presented a comprehensive comparison between the templates for the ORs based on our template selection criteria.

Boosting the accuracy of protein secondary structure prediction through nearest neighbor search and method hybridization
COSI: 3DSIG COSI
  • Spencer Krieger, University of Arizona, United States
  • John Kececioglu, University of Arizona, United States

Short Abstract: Protein secondary structure prediction is a fundamental precursor to many bioinformatics tasks. Nearly all state-of-the-art tools when computing their prediction do not explicitly leverage the vast number of proteins with known structure. Leveraging this additional information in a template-based method has the potential to significantly boost prediction accuracy.

We present a new hybrid approach to secondary structure prediction that gains the advantages of both template- and non-template-based methods. Our core template-based method is an algorithmic approach that uses nearest neighbor search over a template database of fixed-length words to determine estimated class-membership probabilities for each residue in the protein. These probabilities are then input to a dynamic programming algorithm that finds a physically-valid maximum-likelihood prediction for the entire protein. Our hybrid approach exploits a novel accuracy estimator for our core method, that estimates the unknown true accuracy of its prediction, to discern when to switch between template- and non-template-based methods.

On challenging CASP benchmarks, our hybrid approach boosts the state-of-the-art Q8 accuracy by more than 2-10%, and Q3 accuracy by more than 1-3%, yielding the most accurate method currently available for both 3- and 8-state secondary structure prediction.

A preliminary implementation in a tool called Nnessy is available at nnessy.cs.arizona.edu.

Co-evolutionary distance predictions are informative of flexibility
COSI: 3DSIG COSI
  • Dominik Schwarz, University of Oxford, Department of Statistics, United Kingdom
  • Charlotte M Deane, University of Oxford, Department of Statistics, United Kingdom

Short Abstract: Co-evolutionary distance predictions have improved de novo protein structure prediction. We examined their potential to predict stability or flexibility of residue pairs. The database of Conformational Diversity in the Native State (CoDNaS) stores multiple PDB structures of a single protein sequence. We used the two most different structures of a protein (by RMSD) to estimate the flexibility of residue pairs. Distance predictions derived from DMPfold were generated for this subset. The shape of distance probability distributions was found to be informative of stability or flexibility of residue pairs.

Comparison of RNA-DNA and DNA-DNA hybrids interaction with Osmium (II) redox probe using computational docking approach
COSI: 3DSIG COSI
  • Anshul Nigam, Amity University Maharashtra, India
  • Sarra Akermi, Annotation Analytics Pvt. Ltd., Tunisia
  • Sunil Kumar Jayant, Annotation Analytics Pvt. Ltd., India

Short Abstract: Detection of genetic materials of the microorganisms in complex biological samples with robust, sensitive, faster and cheaper detection method is always a challenge. Recently, the limitation of Reverse transcriptase polymerase chain reaction method (RT-PCR) has been encountered for detection of COVID-19 infections because of its time consuming nature and expensive. However, improved electrochemical method with metallic- intercalators have been studied experimentally and theoretically for rapid detection of DNA. Here, we propose a proof of concept method using the In-silico docking approach to find interactions between RNA-DNA and DNA-DNA hybrids using intercalating Osmium (II) redox probe. 3D structure of 10 base pairs of RNA-DNA and DNA-DNA hybrids were generated by discovery studio software. Molecular docking by Autodock 4.2 software revealed that RNA-DNA hybrid forms strong interaction with Osmium (II) redox probe with docking energy of -10.37 kal/mole as compare to DNA-DNA hybrid with energy of -7.25 kcal/mol. Therefore, our study predicts that electrochemical based method using Osmium (II) redox probe for RNA-DNA hybrid could be an alternative for detection of COVID-19 infection.

Computational design of high-affinity peptides bound to the Major Histocompatibility Complex class II
COSI: 3DSIG COSI
  • Rodrigo Ochoa, Max Planck Tandem Group, Biophysics of Tropical Diseases, University of Antioquia, Colombia
  • Alessandro Laio, International School for Advanced Studies - SISSA, Italy
  • Pilar Cossio, Max Planck Tandem Group, Biophysics of Tropical Diseases, University of Antioquia, Colombia

Short Abstract: The availability of X-ray crystal or nuclear magnetic resonance structures of interacting proteins reveals crucial properties involved in protein-protein and protein-peptide interactions. Here we developed a computational peptide-design protocol that mutates and selects optimal amino acids using a stochastic approach with molecular dynamics simulations and binding scoring functions. The protocol was applied to design peptides bound with a higher affinity to the Major Histocompatibility Complex (MHC) class II. The protocol consisted of mutating random amino acid in the peptide structure, followed by the sampling of the complex conformations using molecular dynamics simulations. These were used to calculate an average binding score to select mutations that can increase the binding affinity in the simulated environment. We used as template a crystal structure of the MHC class II allele DRB1*01:01 (PDB id 1dlh) bound to a peptide of 13 amino acids. After performing five design runs of 100 mutations attempts, we selected a set of modified peptide sequences with better binding scores and suitable physicochemical properties. The new bound peptides were subjected to 200ns of MD, and the same scoring functions were applied to determine the binding differences against the original peptide.

Computational epitope binning of protein binders
COSI: 3DSIG COSI
  • Jarjapu Mahita, Dartmouth College, United States
  • Dong-Gun Kim, Korea Advanced Institute of Science and Technology (KAIST), South Korea
  • Yoonjoo Choi, Korea Advanced Institute of Science and Technology (KAIST), South Korea
  • Hak-Sung Kim, Korea Advanced Institute of Science and Technology (KAIST), South Korea
  • Chris Bailey-Kellogg, Dartmouth College, United States

Short Abstract: Recent advances in next-generation sequencing technologies have enabled high-throughput characterization of repertoires comprised of protein binders. Attributing the sequence of a protein binder to its function is possible through structural elucidation, a technique unsuitable for large-scale structure determination of protein sequences. Epitope binning is emerging as a versatile tool for these purposes, by enabling identification of binders likely to target similar epitopes on the antigens and subsequent categorization into bins. Limitations of experimental epitope binning due to the vast sequence space necessitates the need for computational methods. We describe a computational epitope binning method that utilizes a scoring scheme developed by us. To test the reliability of this method, we applied it to bin a phage-displayed library of IL6-binding repebodies which are binding scaffolds containing leucine-rich repeat (LRR) modules. Results of our method were validated using experimental epitope binning. We further show how the output of our binning method was used to drive mutagenesis experiments for narrowing down residues contributing to the specificity of each bin. Overall, the results demonstrate the utility of our method and indicate that it is a promising strategy to reliably bin protein binders.

Computational ligand screening and molecular dynamics targeting inhibitors for the nicotinic acetylcholine receptor of Halyomorpha halys
COSI: 3DSIG COSI
  • Beatriz Pereira Nascimento, Universidade Estadual do Sudoeste da Bahia, Brazil
  • Bruno Andrade, Universidade Estadual do Sudoeste da Bahia, Brazil

Short Abstract: Halyomorpha Halys, also known as the brown-stink bug (BMSB), is native from the East Asia and has become one of the main urban and agricultural pests present in various crops and fruit orchards in many countries. This pest has been presenting resistant populations to on neonicotinoids-based compounds, and this is growing every year, and then new strategies for pest management are required. The aim of this work was constructing the 3D structure of the BMSB nicotinic acetylcholine receptor (nAChR), as well as searching for molecules which can act as new inhibitors of this protein.The 3D structure of nAChR was modeled using workspace by the SWISS-MODEL (swissmodel.expasy.org/) Ligand searching was carried out by the OpenEye ROCS software, which considered pharmacophoric characteristics of previous described nAChR inhibitors for ligand searching in a dataset of structures downloaded from the Zinc Database (zinc15.docking.org/). Ligand searching was performed for different known inhibitor chemical classes, in which we highlight the classes of pyrethroid and neonicotinoid compounds. In a second step, we performed molecular docking for the selected ligands using AutoDock Vina with the H. halys nAChR structure. Protein-ligand complex Molecular Dynamics simulations were performed for the three best complexes from docking studies using AMBER 14 package.

Deep Learning Protein Contacts and Real-valued Distances Using PDNET
COSI: 3DSIG COSI
  • Badri Adhikari, University of Missouri-St. Louis, United States

Short Abstract: As deep learning algorithms drive the progress in protein structure prediction, a lot remains to be studied at this emerging crossway of deep learning and protein structure prediction. Recent findings show that inter-residue distance prediction, a more granular version of the well-known contact prediction problem, is a key to predict accurate models. We believe that deep learning methods that predict these distances are still at infancy. To advance these methods and develop other novel methods, we need a small and representative dataset packaged for fast development and testing. In this work, we introduce Protein Distance Net (PDNET), a dataset derived from the widely used DeepCov dataset consisting of 3456 representative protein chains. It is packaged with all the scripts that were used to curate the dataset, generate the input features and distance maps, and scripts with deep learning models to train, validate, and test. Deep learning models can also be trained and tested in a web browser using free platforms such as Google Colab. We discuss how our framework can be used to predict contacts, distance intervals, and real-valued distances. PDNET is available at github.com/ba-lab/pdnet/.

DELPHI: accurate deep ensemble model for protein interaction sites prediction
COSI: 3DSIG COSI
  • Yiwei Li, University of Western Ontario, Canada
  • Lucian Ilie, University of Western Ontario, Canada

Short Abstract: Motivation: Proteins usually perform their functions by interacting with other proteins, which is why accurately predicting protein-protein interaction (PPI) binding sites is a fundamental problem. Experimental methods are slow and expensive. Therefore, great efforts are being made towards increasing the performance of computational methods.
Methods and Results: We propose DELPHI (DEep Learning Prediction of Highly probable protein Interaction sites), a new sequence-based deep learning suite for PPI binding sites prediction. DELPHI has an ensemble structure with data augmentation. The model structure combines a convolutional neural network and a recurrent neural network with fine tuning. Three novel features, ProtVec1D, position information, and high-scoring segment pair (HSP), are used in addition to nine existing ones. We comprehensively compare DELPHI to nine state-of-the-art programs on five datasets, and DELPHI outperforms competitors in all metrics. The trained model, source code for training, predicting, feature computation, and data processing are made freely available online.
Conclusion: DELPHI has a novel network architecture in which three features are used the first time in this problem. DELPHI is shown to be more accurate than the current state-of-the-art programs. All components of DELPHI are freely available online.

Development of Enhanced Conformational Sampling Methods for GPCRs
COSI: 3DSIG COSI
  • Erik Serrano, California State University, Northridge, United States
  • Rafeed Khleif, California State University, Northridge, United States
  • Ravinder Abrol, California State University, Northridge, United States

Short Abstract: G protein-coupled receptors (GPCRs) are known to possess multiple active conformational states in nature, however, studying these conformations is extremely challenging as majority of the functionally important conformations have high energy compared to the lowest energy conformation. We are developing a method called Enhanced Conformational Markov-state Sampling in Membrane BiLayer Environment (EnCoMSeMBLE) that would enhance our search of the conformational landscape of GPCRs and that can be applied to -helical transmembrane proteins in general. It enables a level of conformational sampling not achievable by classical or accelerated molecular dynamics (MD) simulations or Markov-State Models (MSM). This method combines brute force conformational sampling of helix-helix interactions in the membrane with MD simulations and Markov-State modeling to identify functionally important conformations. This method is applied to the Glucagon Like peptide-1 receptor (GLP1-R), a class B GPCR, and the muscarinic acetylcholine M2 receptor, a class A GPCR, both of which have been crystallized in inactive as well as active conformations. The comparison of the activation landscapes of class A and class B GPCRs is beginning to provide key similarities and differences in activation across these very distinct GPCRs. This detailed understanding of the GPCR activation complements major structural biology efforts underway targeting GPCRs.

DISTEVAL: A web-server for evaluating predicted protein distances
COSI: 3DSIG COSI
  • Badri Adhikari, University of Missouri-St. Louis, United States
  • Bikash Shrestha, University of Missouri-St. Louis, United States
  • Matthew Bernardini, University of Missouri-St. Louis, United States

Short Abstract: Predicted protein inter-residue contacts and distances are the key intermediate steps towards accurate protein structure prediction. Distance prediction or distance-range prediction is a more granular version of the contact prediction problem and it is now introduced as a new challenge in the CASP14 experiment. Despite the recent proliferation of methods for predicting distances, no methods currently exist for evaluating predicted distances. This work discusses a new web-server for evaluating predicted protein inter-residue distances. The server accepts predicted contacts or distances along with a true structure as input. It generates informative chord diagrams and heat maps to facilitate visual assessment and evaluates predictions using MAE and the standard ‘contact precision’ metric. Our tool, DISTEVAL, is available at deep.cs.umsl.edu/disteval/.

EM Map Segmentation and De Novo Protein Structure Modeling for Multiple Chain Complexes with MAINMAST
COSI: 3DSIG COSI
  • Genki Terashi, Department of Biological Sciences, Purdue University, United States
  • Yuki Kagaya, Tohoku University, Japan
  • Daisuke Kihara, Purdue University, United States

Short Abstract: The significant progress of cryo-electron microscopy (cryo-EM) poses a pressing need for software for
structural interpretation of EM maps. Methods for map segmentation is particularly needed for the modeling
because most of the modeling methods are designed for building a single protein structure. Here, we developed
new software, MAINMASTseg, for segmenting maps with symmetry. Unlike existing segmentation methods
that merely consider densities in an input EM map, MAINMASTseg captures underlying molecular structures
by constructing a skeleton that connects local dense points in the map. MAINMASTseg performed significantly
better than other popular existing methods.

Environmental conditions shape the nature of a minimal bacterial genome
COSI: 3DSIG COSI
  • Magdalena Antczak, University of Kent, United Kingdom
  • Mark Wass, Industrial Biotechnology Centre and School of Biosciences, University of Kent, Canterbury, UK, United Kingdom
  • Martin Michaelis, Industrial Biotechnology Centre and School of Biosciences, University of Kent, Canterbury, UK, United Kingdom

Short Abstract: Of the 473 genes in the genome of the bacterium with the smallest genome generated to date, 149 genes have unknown function, emphasising a universal problem; less than 1% of proteins have experimentally determined annotations. Here, we combine the results from state-of-the-art in silico methods for functional annotation and assign functions to 66 of the 149 proteins. Proteins that are still not annotated lack orthologues, lack protein domains, and/ or are membrane proteins. Twenty-four likely transporter proteins are identified indicating the importance of nutrient uptake into and waste disposal out of the minimal bacterial cell in a nutrient-rich environment after removal of metabolic enzymes. Hence, the environment shapes the nature of a minimal genome. Our findings also show that the combination of multiple different state-of-the-art in silico methods for annotating proteins is able to predict functions, even for difficult to characterise proteins and identify crucial gaps for further development.

Evidence of Antibody Repertoire Functional Convergence through Public Baseline and Shared Response Structures
COSI: 3DSIG COSI
  • Matthew Raybould, University of Oxford, United Kingdom
  • Claire Marks, University of Oxford, United Kingdom
  • Aleksandr Kovaltsuk, University of Oxford, United Kingdom
  • Alan Lewis, GlaxoSmithKline, United Kingdom
  • Jiye Shi, UCB Pharma, United Kingdom
  • Charlotte Deane, University of Oxford, United Kingdom

Short Abstract: The antibody repertoires of different individuals ought to exhibit significant functional commonality, given that most pathogens trigger a successful immune response in most people. Sequence-based antibody repertoire analysis based on identifying common genetic origins and high sequence identities has so far offered little evidence for this phenomenon. However, to engage the same epitope, antibodies only require a similar binding site structure and the presence of key paratope interactions, which can occur even when their sequences are dissimilar. Here, we investigate functional convergence in human antibody repertoires by comparing the antibody structures they contain. We first structurally profile baseline antibody diversity, predicting all modellable distinct structures within each repertoire. This analysis uncovers a high degree of structural commonality. For instance, around 3% of distinct structures are common to snapshots from ten unrelated individuals (‘Public Baseline’ structures). We then apply the same structural profiling method to the repertoire snapshots of three individuals before and after flu vaccination, detecting a convergent structural drift indicative of recognising similar epitopes (‘Public Response’ structures). Antibody Model Libraries (AMLs) derived from Public Baseline and Public Response structures represent a powerful geometric basis set of low-immunogenicity candidates exploitable for general or target-focused therapeutic antibody screening.

Evolutionary pathways of repeat protein topology in bacterial outer membrane proteins
COSI: 3DSIG COSI
  • Joanna Slusky, University of Kansas, United States
  • Meghan Franklin, University of Kansas, United States
  • Sergey Nepomnyachiy, University of Haifa, Israel
  • Ryan Feehan, University of kansas, United States
  • Rachel Kolodny, University of Haifa, Israel
  • Nir Ben Tal, Tel-Aviv University, Israel

Short Abstract: Outer membrane proteins (OMPs) are the proteins in the surface of Gram-negative bacteria. These proteins have diverse functions but a single topology: the β-barrel. Sequence analysis has suggested that this common fold is a β-hairpin repeat protein, and that amplification of the β-hairpin has resulted in 8–26-stranded barrels. Using an integrated approach that combines sequence and structural analyses, we find events in which non-amplification diversification also increases barrel strand number. Our network-based analysis reveals strand-number-based evolutionary pathways, including one that progresses from a primordial 8-stranded barrel to 16-strands and further, to 18-strands. Among these pathways are mechanisms of strand number accretion without domain duplication, like a loop-to-hairpin transition. These mechanisms illustrate perpetuation of repeat protein topology without genetic duplication, likely induced by the hydrophobic membrane. Finally, we find that the evolutionary trace is particularly prominent in the C-terminal half of OMPs, implicating this region in the nucleation of OMP folding.

Frustration leads to fuzzy interactions in disordered proteins
COSI: 3DSIG COSI
  • Viktor Ambrus, Laboratory of Protein Dynamics, University of Debrecen, Hungary
  • Peter Wolynes, Center for Theoretical Biological Physics, Rice University, United States
  • Diego Ferreiro, Laboratorio de Fisiología de Proteínas, IQUIBICEN N-CONICET, FCEyN, Universidad de Buenos Aires, Argentina
  • Monika Fuxreiter, Laboratory of Protein Dynamics, University of Debrecen, Hungary
  • Maria I. Freiberger, Laboratorio de Fisiología de Proteínas, Departamento de Química Biológica IQUIBICEN N-CONICET, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires. Laboratory of Protein Dynamics, University of Debrecen, Hungary Center for Theoretical Biological Physics, Rice University, Houston, USA, Argentina

Short Abstract: Background:
While proteins fold, strong energetic conflicts are minimized towards their native states according to the “Principle of Minimal Frustration". Local violations of this principle allow proteins to encode the complex energy landscapes, required for active biological functions.
Disordered proteins often exhibit templated folding and adopt a well-defined structure upon binding. These complexes, however, are fuzzy, as they adopt different binding modes with different partners.

Description:
We have performed a systematic analysis of frustration on complexes of 138 disordered proteins. These proteins contained disordered regions in the free form, while exhibited different binding modes in the bound form. Disorder-to-order regions (DOR) fold upon binding; disorder-to-disorder regions (DDRs) remained disordered with the partner; while many of the regions were context-dependent (CDRs) and observed in both ordered and disordered forms in their complexes.

Conclusions:
We have found, in particular, that folding of disordered regions upon binding reduces frustration, but the interactions at the binding interface are not fully optimised. Disordered regions, which alternate between folded and disordered forms in different binding modes exhibit a higher degree frustration in their bound states. These results rationalize specificity without achieving an optimal structure and provide a physical framework for interaction versatility of disordered regions.

FrustratometeR: An R package to calculate energetic local frustration in proteins
COSI: 3DSIG COSI
  • Atilio O. Rausch, Facultad de Ingenieria, Universidad Nacional de Entre Rios, Argentina
  • Leandro G. Radusky, Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, Spain
  • Diego Ferreiro, Laboratorio de Fisiología de Proteínas, IQUIBICEN N-CONICET, FCEyN, Universidad de Buenos Aires, Argentina
  • Maria I. Freiberger, Laboratorio de Fisiología de Proteínas, Departamento de Química Biológica IQUIBICEN N-CONICET, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires. Laboratory of Protein Dynamics, University of Debrecen, Hungary Center for Theoretical Biological Physics, Rice University, Houston, USA, Argentina
  • Rodrigo Gonzalo Parra, European Molecular Biology Laboratory, Genome Biology Unit, Germany

Short Abstract: Background:
Energetic local frustration has been extensively linked to multiple functional aspects of proteins. The protein frustratometer has been present as a web server service since 2012, receiving more than 170 citations so far. Here we present, frustratometeR, a standalone R package that extends the set of analysis present at the web server together with brand new functionalities to help elucidate the role of local frustration in proteins function and dynamics.

Description:
Given a PDB file and a frustration index type, several visualizations and pymol scripts are produced. Additionally the frustratometeR package can compute the frustration index distribution for all alternative amino acids for a given residue to evaluate the impact of point mutations in the structure. A module to analyse frustration change in Molecular Dynamics simulations is implemented.

Conclusions:
A standalone version of the frustratometer has long been expected by many users. The frustratometeR not only allows to perform frustration calculations locally but also extends its functionalities to study the impact of residue mutations and the mechanistic role of frustration during protein dynamics.

Generating Property-Matched Decoy Molecules Using Deep Learning
COSI: 3DSIG COSI
  • Fergus Imrie, University of Oxford, United Kingdom
  • Anthony Bradley, Exscientia Ltd, United Kingdom
  • Mihaela van der Schaar, University of Cambridge, United Kingdom
  • Charlotte Deane, University of Oxford, United Kingdom

Short Abstract: An essential component in the development of structure-based virtual screening methods is the datasets or benchmarks used for training and testing. These typically consist of experimentally verified active molecules together with assumed inactive molecules, known as decoys.
However, the decoy molecules used in such sets have been shown to exhibit substantial bias in basic chemical properties. In some cases, there is evidence to suggest that some structure-based methods are simply exploiting this bias, rather than learning how to perform molecular recognition. The use of biased decoy molecules therefore is preventing generalisation and hindering the development of structure-based virtual screening methods.
We have developed a deep learning method to generate property-matched decoy molecules, called DeepCoy. This eliminates the need to use a database to search for molecules and allows decoys to be generated for the requirements of a particular active molecule. Using DeepCoy generated molecules reduced the bias in basic physicochemical properties of such decoy molecules by 78% and 65% in the DUD-E and DEKOIS 2.0 databases, respectively.
We believe that this substantial reduction in bias will benefit the development and improve generalisation of structure-based virtual screening methods.

GeoMine: A Web-Based Tool for Chemical Three-Dimensional Searching of the PDB
COSI: 3DSIG COSI
  • Joel Graef, Universität Hamburg - Center for Bioinformatics (ZBH), Germany
  • Konrad Diedrich, Universität Hamburg - Center for Bioinformatics (ZBH), Germany
  • Katrin Schöning-Stierand, Universität Hamburg - Center for Bioinformatics (ZBH), Germany
  • Matthias Rarey, Universität Hamburg - Center for Bioinformatics (ZBH), Germany

Short Abstract: The relative arrangement of functional groups and the shape of protein binding sites are the key elements to comprehend a protein’s function. Interactive searching for these three-dimensional patterns is an important tool in life science research, however highly challenging from the computational point of view. This problem is addressed by only a few tools limited in terms of query variability, adjustable search sets, retrieval speed and user friendliness. Here, we present GeoMine, a computational approach enabling spatial geometric queries with full chemical awareness on a regularly updated database containing protein-ligand interfaces of the entire PDB. Due to the use of modern algorithms and database technologies, reasonable queries can be searched in up to a few minutes. With a GeoMine query, almost any relative atom arrangement can be searched. GeoMine is implemented as a publicly available web service within ProteinsPlus (proteins.plus). The user interface provides an interactive 3D panel that allows an easy design of queries either from scratch or based on a 3D representation of an existing protein-ligand complex. GeoMine opens a plethora of data analytics opportunities on protein structures, a few of them showcased in this presentation.

HOW CHANGES IN SUGAR COMPOSITION IMPACT GLYCOPROTEIN DYNAMICS? AN EXAMPLE STUDY ON N-GLYCANS IN INSULIN RECEPTOR
COSI: 3DSIG COSI
  • Rajas Rao, Université de Reims Champagne-Ardenne, France
  • Alexandre Guillot, Université de Reims Champagne-Ardenne, France
  • Camille Besançon, Université de Reims Champagne-Ardenne, France
  • Nicolas Belloy, Université de Reims Champagne-Ardenne, France
  • Jessica Jonquet, Université de Reims Champagne-Ardenne, France
  • Manuel Dauchez, Université de Reims Champagne-Ardenne, France
  • Stephanie Baud, Université de Reims Champagne-Ardenne, France

Short Abstract: Glycosylation is among the most common post-translational modifications in proteins, despite the fact that it is observed in only about 10% of all the protein structures in PDB. Modifications of sugar composition in glycoproteins have a profound impact on the overall physiology of the organism. One such example is the development of insulin resistance, which has been attributed to removal of sialic acid residues from N-glycans of insulin receptor from various experimental studies. How such modifications affect the glycan-glycoprotein dynamics, and ultimately their function is not clearly understood till date. In this study, we performed molecular dynamics simulations of glycans in different environments. We studied the effects of removal of sialic acid on the glycan, as well as on the dynamics of leucine-rich repeat L1 region of the insulin receptor ectodomain. We observed perturbations in glycan dynamics as a result of removal of sialic acid, which may ultimately result in changes in the dynamics of insulin and insulin-receptor interactions. Our observations will further aid in understanding of the role of sugars in maintaining homeostasis through the example of glycans in insulin receptor.

How proteins evolved to recognize an ancient nucleotide?
COSI: 3DSIG COSI
  • Aya Narunsky, Yale University, United States
  • Amit Kessel, Tel Aviv University, Israel
  • Ron Solan, Tel Aviv University, Israel
  • Vikram Alva, Max Planck Institute for Developmental Biology, Germany
  • Rachel Kolodny, University of Haifa, Israel
  • Nir Ben Tal, Tel-Aviv University, Israel

Short Abstract: Proteins’ interactions with ancient ligands may reveal how molecular recognition emerged and evolved. We explore how proteins recognize adenine: a planar rigid fragment found in the most common and ancient ligands. We have developed a computational pipeline that extracts protein–adenine complexes from the Protein Data Bank, structurally superimposes their adenine fragments, and detects the hydrogen bonds mediating the interaction. Our analysis extends the known motifs of protein–adenine interactions in the Watson–Crick edge of adenine and shows that all of adenine’s edges may contribute to molecular recognition. We further show that, on the proteins' side, binding is often mediated by specific amino acid segments (“themes”) that recur across different proteins, such that different proteins use the same themes when binding the same adenine-containing ligands. We identify numerous proteins that feature these themes and are thus likely to bind adenine-containing ligands. Our analysis suggests that adenine binding has emerged multiple times in evolution.

(Abstract taken from: Narunsky, A., Kessel, A., Solan, R., Alva, V., Kolodny, R., & Ben-Tal, N. (2020). On the evolution of protein-adenine binding. Proceedings of the National Academy of Sciences of the United States of America, 117(9), 4701–4709. doi.org/10.1073/pnas.1911349117)

Human Genome Topology at the population scale: CTCF and RNAPII-mediated chromatin looping shapes the three-dimensional structure of transcriptional factories
COSI: 3DSIG COSI
  • Michal Denkiewicz, Warsaw University of Technology, Faculty of Mathematics and Information Science; University of Warsaw, CeNT, Poland
  • Michal Wlasnowolski, Warsaw University of Technology, Faculty of Mathematics and Information Science, Poland
  • Michal Kadlof, University of Warsaw, Centre of New Technologies, Poland
  • Kamila Winnicka, University of Warsaw, Centre of New Technologies, Poland
  • Michal Sadowski, University of Warsaw, Centre of New Technologies; University of California Los Angeles, United States
  • Karolina Jodkowska, University of Warsaw, Centre of New Technologies, Poland
  • Kaustav Sengupta, University of Warsaw, Centre of New Technologies, Poland
  • Maciej Borodzik, University of Warsaw, Faculty of Mathematics, Informatics and Mechanics, Poland
  • Yijun Ruan, The Jackson Laboratory for Genomic Medicine, United States
  • Dariusz Plewczynski, Centre of New Technologies, University of Warsaw, Poland

Short Abstract: Recent studies have demonstrated the importance of chromatin spatial organization in human health by comparing genomic interaction patterns between normal and abnormal cells. However, no systematic studies were published so far on the variability of chromatin topology in human population of healthy individuals. We combine various types of experimental data and computational modeling to account for genome topological variation within population and to gain a deeper insight into mechanisms of gene regulation by chromatin topology.
Further, we present a new approach to genome topology assessment employing a graph theory. We reveal non-linear properties of human genome at the population scale using multiple graph-based metrics. The meta-graph is built on chromatin interactions derived from ChIA-PET experiment. The clustering of the meta-graph gives us information about how strongly connected are some cliques of DNA segments.

Finally, the high-quality interaction ChIA-PET data combined with structural variants (deletions, duplications, insertions, inversions) data from the 1000 Genomes Catalogue of Human Genetic Variation and from the GWAS Catalogue enables us to perform genome-wide population-scale analysis of human genomes topology. Our results show a close link between variation in chromatin interaction networks, therefore imprinted Human Genome Topology (HGT) and differential gene transcription.

iCn3D, a web-based 3D viewer for sharing 1D/2D/3D representations of biomolecular structures
COSI: 3DSIG COSI
  • Jiyao Wang, National Institutes of Health, United States
  • Philippe Youkharibache, National Institutes of Health, United States
  • Dachuan Zhang, National Institutes of Health, United States
  • Christopher Lanczycki, National Institutes of Health, United States
  • Renata Geer, National Institutes of Health, United States
  • Thomas Madej, National Institutes of Health, United States
  • Lon Phan, National Institutes of Health, United States
  • Minghong Ward, National Institutes of Health, United States
  • Shennan Lu, National Institutes of Health, United States
  • Gabriele Marchler, National Institutes of Health, United States
  • Yanli Wang, National Institutes of Health, United States
  • Stephen Bryant, National Institutes of Health, United States
  • Lewis Geer, National Institutes of Health, United States
  • Aron Marchler-Bauer, National Institutes of Health, United States

Short Abstract: iCn3D (I-see-in-3D) is a web-based 3D molecular structure viewer focusing on interactive structural analysis. It can simultaneously show 3D structure, 2D molecular contacts and 1D protein and nucleotide sequences through an integrated sequence/annotation browser. Pre-defined and arbitrary molecular features can be selected in any of the 1D/2D/3D windows as sets of residues and these selections are synchronized dynamically in all displays. Biological annotations such as protein domains, single nucleotide variations, etc. can be shown as tracks in the 1D sequence/annotation browser. These customized displays can be shared with colleagues or publishers via a simple URL. iCn3D can display structure–structure alignments obtained from NCBI’s VAST+ service. It can also display the alignment of a sequence with a structure as identified by BLAST, and thus relate 3D structure to a large fraction of all known proteins. iCn3D can also display electron density maps or electron microscopy (EM) density maps, and export files for 3D printing. The following example URL exemplifies some of the 1D/2D/3D representations: www.ncbi.nlm.nih.gov/Structure/icn3d/full.html?mmdbid=1TUP&showanno=1&show2d=1&showsets=1. Its source code is available at github.com/ncbi/icn3d.

Identification of and structural insights into the multistage antimalarial target Plasmepsin X from Plasmodium falciparum (PfPlmX)
COSI: 3DSIG COSI
  • Cissé Cheickna, African Center of Excellence in Bioinformatics (ACE-B) / University of Sciences, Techniques and Technologies (USTTB), Mali
  • Mamadou Sangare, African Center of Excellence in Bioinformatics (ACE-B)/ University of Sciences, Techniques and Technologies (USTTB), Mali
  • Alia Benkahla, Laboratory of Bioinformatique, biomathématiques, biostatistiques (Bims) / Institut Pasteur de Tunis, Tunisia
  • Jeffrey Shaffer, School of Public Health and Tropical medicine / Tulane University, United States
  • Seydou Doumbia, University of clinical Research Center (UCRC) / University of Sciences, Techniques and Technologies (USTTB), Mali
  • Mamadou Wele, African Center of Excellence in Bioinformatics (ACE-B)/ University of Sciences, Techniques and Technologies (USTTB), Mali

Short Abstract: Plasmodium falciparum is a protozoan parasite responsible for the most severe and deadly form of malaria. The threat of emerging resistance of this parasite to last-resort antimalarial drugs is jeopardizing recent progress in the fight against malaria. There is a need to take the lead in identifying new targets and small molecules as therapeutic drug precursors for the development of new antimalarial. Given the urgency, the tools of Bioinformatics are the most appropriate to predict therapeutic candidates in a reasonable time frame and at a lower cost. Thus, thanks to TDRTargets.org, we have screened 17 parasite proteins with interesting characteristics as therapeutic targets. The analysis of the interaction network of these proteins allowed us to select the most promising target and to build their models by homology via I-TASSER and SWISS-MODEL. Here, we propose the model of the well-known multi-step target Plasmodium falciparum plasmepsin X (pfplmX) refined with Molecular Dynamics (MD) simulation using NAMD 2.9. Despite its proven involvement in the mechanism of red blood cell infection, there is currently no 3D structure of PfPlmX. So this model could contribute to the design of new inhibitors potential precursors of a future antimalarial drug.

IGAP- Integrative Genome Analysis Pipeline Reveals New Gene Regulatory Model Associated with Nonspecific TF-DNA Binding Affinity
COSI: 3DSIG COSI
  • Alireza Naeini, Department of Pathology, Oslo University Hospital - Norwegian Radium Hospital, Oslo, Norway, Norway
  • Amna Farooq, Department of Pathology, Oslo University Hospital - Norwegian Radium Hospital, Oslo, Norway, Norway
  • Magnar Bjoras, Institute for Cancer Research and Molecular Medicine, Norwegian University of Science and Technology, Trondheim, Norway, Norway
  • Junbai Wang, Department of Pathology, Oslo University Hospital - Norwegian Radium Hospital, Oslo, Norway, Norway

Short Abstract: The human genome is regulated in a multi-dimensional fashion. While biophysical factors like Non-specific Transcription factor Binding Affinity (nTBA) act at DNA sequence level, other factors are acting above sequence levels such as histone modifications and 3-D chromosomal interactions. This multidimensionality of regulation requires employing many of these factors for a proper understanding of the regulatory landscape of the human genome. Here, we propose a new biophysical model for estimating nTBA. Integration of nTBA with chromatin modifications and chromosomal interactions, using a new Integrative Genome Analysis Pipeline (IGAP), reveals additive effects of nTBA to regulatory DNA sequences and identifies three types of genomic zones in the human genome (Inactive Genomic Zones, Poised Genomic Zones, and Active Genomic Zones). It also unveils a novel long distance gene regulatory model: chromosomal interactions reduce the physical distance between the high occupancy target (HOT) regions that results in high nTBA to DNA in the area, which in turn attract TFs to such regions having higher binding potential. These findings will help to elucidate the three-dimensional diffusion process that TFs use during their search for the right targets.

In silico ensemble modeling suggests binding-induced expansion as a possible functional mechanism for two endocytic proteins.
COSI: 3DSIG COSI
  • N. Suhas Jagannathan, National University of Singapore, Singapore
  • Christopher W. V. Hogue, Mechanobiology Institute, NUS. Current Address: Global AI Accelerator, Santa Clara, CA., United States
  • Lisa Tucker-Kellogg, Duke NUS Medical School, Singapore, Singapore

Short Abstract: Intrinsically disordered regions (IDRs) are known to function as linkers or through folding-upon-binding. In this work, we explore the possibility of binding-induced expansion, a mechanism where binding of a partner to an IDR results in either a local or a global expansion of the steric volume occupied by the IDR. We focus on the IDRs of Epsin and Eps15 from Clathrin-mediated endocytosis (CME), both of which contain multiple binding motifs to another CME protein AP2. We generated large conformational ensembles for Epsin and Eps15 IDRs and studied how the dimensions and energetics of the ensembles varied when bound to increasing numbers of AP2 molecules. Our results showed that Epsin-IDR and Eps15-IDR behave differently upon AP2 binding. Epsin-IDR undergoes binding-induced global expansion, a mechanism where AP2-binding causes a concurrent increase in the steric volume occupied by the energetically-stable members of the ensemble. This results in molecular crowding of Epsin-IDR at the endocytic hotspot, that could help remodel plasma membrane during endocytosis. In contrast, Eps15-IDR undergoes binding-induced local expansion, a mechanism where the binding of AP2 at one motif in the IDR makes other motifs more accessible for binding further AP2 molecules, allowing Eps15-IDR to function as an AP2 recruiter during endocytosis.

In silico Investigation of the Mechanism of Transmembrane Transfer of Cholesterol by NPC1
COSI: 3DSIG COSI
  • Shelby Baker, Presbyterian College, United States
  • Marharyta Petukh, Presbyterian College, United States
  • Marharyta Petukh, Presbyterian College, United States

Short Abstract: NPC1 is a large transmembrane multidomain protein located in the lysosomes and late endosomes which is responsible for cholesterol (CLR) transfer from vesicles to the endoplasmic reticulum and other cellular compartments. In humans, change in the protein activity can cause Niemann-Pick type C disease associated with failure of CLR delivery from endosomes/lysosomes to the appropriate cellular compartments in humans. Previously it was proposed that the sterol sensing domain (SSD), transmembrane helices 3-7 of NPC1, can be involved in CLR transfer through the membrane by forming a pore. We tested this hypothesis by applying evolutionary and in silico analysis to study the structural and functional activity of SSD. With steered molecular dynamic simulations, we found that CLR indeed can be transferred through SSD. We detected the only possible path of CLR transferred through the pore that is aligned with highly conserved residues. The free motion of CLT through the pore is limited by establishing transient specific interactions between the ligand and the residues of the protein. The substitution of one of these residues (D771A) reduces the SSD stability, which can affect CLR initial binding, and causes a significant deviation in CLR passage through the membrane compared to the wild type protein.

In silico selection of RNA aptamers for a target protein based on discriminative classifiers and the Monte-Carlo tree search
COSI: 3DSIG COSI
  • Gwangho Lee, Pusan National University, South Korea
  • Giltae Song, Pusan national University, South Korea

Short Abstract: Aptamers are polynucleotide or peptide chains folded into a stable structure and useful for therapeutic applications. SELEX (systematic evolution of ligands by exponential enrichment) is one of the experimental methods to generate the aptamers, but it is too expensive and time-consuming. Recently, there are some in silico attempts to reduce the cost such as an approach based on discriminative classifiers. Some of such methods actually generate the sequences that bind for a target protein, but they produce low quality and specific size candidates only.
In this study, we develop an approach based on the Monte-Carlo Tree Search (MCTS) algorithm for generating the sequences using a score function computed by a discriminative classifier. We evaluate our approach based on three metrics: the minimum free energy (MFE) of aptamer structures, scores in docking simulations, and TM-score_RNA structure similarity scores. Our model shows quite a similar MFE to real aptamers and similar to or better docking scores than other existing methods in the ZDOCK docking simulations. Most of our samples obtain TM-score_RNA scores higher than 0.17 (< 0.17 indicates unrelated random sequences). We believe that our study can substantially reduce the cost and time for generating aptamers.

IN SILICO STUDIES FOR SEARCHING NEWARYLALKYLAMINE N-ACETYLTRANSFERASE (AANAT) INHIBITORS OF Aedes Aegypti MOSQUITO
COSI: 3DSIG COSI
  • Maria Angélica Bomfim Oliveira, Universidade Estadual do Sudoeste da Bahia, Brazil
  • Bruno Silva Andrade, Universidade Estadual do Sudoeste da Bahia, Brazil

Short Abstract: The Aedes Aegypti mosquito is the main vector of Dengue, Chikungunya Fever, Zika Virus infection, as well as other arboviruses. Dengue is the main arbovirus that affects Brazil, being characterized as one of the major public health problems in this country. The arylalkylamine N-acetyltransferase (aaNAT) is an essential enzyme in the process of cuticle sclerotization and mosquito development. The present work consisted of a theoretical study of an in silico ligand screening for new aaNAT inhibitors, using the following approach: i) homology modeling of the aaNAT 3D structure by the automated SWISS MODEL (swissmodel.expasy.org/) workspace, using the dopamine N-acetyltransferase (code PDB 3V8I) as a template; ii) virtual screening processes based on the structure of aaNAT natural substrate for searching novel ligands in ZINC15 database (zinc15.docking.org/); iii) molecular docking studies with AutoDock Vina for selecting the best ligands, with best affinity energies in the complexes with the target receptor, in comparison to the natural substrate; iv) and finally, in silico studies for ligand toxicity, in order to propose the best molecules which can be tested as new insecticide candidates against A. Aegypti.

In silico study of potential inhibitors against Moniliophthora roreri Cyclophilin A
COSI: 3DSIG COSI
  • Fernanda Rangel, Universidade estadual de santa cruz, Brazil
  • Andria Freitas, Universidade estadual de santa cruz, Brazil
  • Bruno Andrade, Universidade estadual de santa cruz, Brazil

Short Abstract: Moniliiophitora roreri is the causative agent of moniliasis of the Theobroma cacao, the cocoa tree. This basidiomycete is selective to the plant's fruit, and it has been causing a severe loss in cocoa production over decades that can reach up to 90%. In vivo interaction studies have shown increased expression of the cyp gene in this fungus, which is fundamental in fungal reproduction and pathogenicity. Thus, Cyclophyllin A proved to be a potential target for new fungicides. In this work, in silico, and in vitro analyzes were performed. A search for ligands was carried out using the ZINC database, addition to the PharmaGist server for pharmacophore modeling. The MpCYPA structure was modeled using the SWISS-MODEL server, and the initial structure was fully minimized using the AMBER 14 package, through 5000 steepest descent and 5000 conjugated gradient minimization cycles, to adjust strategy clashes.. Docking calculations were performed by Autodock Vina and 2D ligand interaction maps were built by Discovery Studio 2.5. At the end of our research, we found 3 small molecules that can be tested by in vitro and in vivo tests to check their potential as new fungicides against M. roreri.

In silico study of potential inhibitors for the TOR protein (Target of Rapamycin) of Moniliophthora perniciosa
COSI: 3DSIG COSI
  • Andria Freitas, Universidade Estadual de Santa Cruz-UESC, Brazil
  • Fernanda Rangel, Universidade Estadual de Santa Cruz, Brazil
  • Bruno Andrade, Universidade Estadual do Sudoeste da Bahia, Brazil

Short Abstract: Moniliophthora perniciosa is basidiomycete that causes witches' broom in Theobroma cacao L. (cacao). In silico studies of the interaction and inhibition of proteins involved in the development of M. perniciosa raised perspectives on the comprehension of the autophagy process mediated by TOR signaling pathway (Target of Rapamycin). This work to characterize target proteins and identify inhibitors to MpTOR, as well proposing interaction models through molecular docking studies. Ligand searching was performed using ZINC and DrugBank databases. MpTOR structure was modeled by the SWISS-MODEL server, and after the initial structure was full energy minimized by AMBER 14, using energy by 5000 cycles descent and conjugated gradient for adjusting protein clashes. Docking calculations were performed by Autodock Vina, and the 2D ligand interaction maps were constructed by Discovery Studio 2.5. In vitro analyzes were carried out, and the FKB-rapamycin domain of MpTOR was expressed through a heterologous system, the analysis of the secondary structure by circular dichroism spectroscopy (CD). The inhibitory potential of the ligands selected in the in-silico studies was confirmed by in vitro experiments, and our results suggested these molecules have potential for MpTOR inhibition, as well as it can be used for controlling the infection and Witches Broom disease

Investigating the Relationship Between Structural Symmetry and Function in Membrane Proteins
COSI: 3DSIG COSI
  • Emily Yaklich, Computational Structural Biology Unit, National Institute of Neurological Disorders and Stroke, NIH, United States
  • Antoniya Aleksandrova, Computational Structural Biology Unit, National Institute of Neurological Disorders and Stroke, NIH, United States
  • Lucy Forrest, NINDS - NIH, United States

Short Abstract: Membrane proteins are encoded by 20-30% of genomes and play key roles in a number of diverse functions such as signaling, scaffolding, and transporting molecules. Many of the available structures of membrane proteins exhibit symmetry and pseudo-symmetry both between subunits and within a subunit. The abundance of symmetry is not only striking, but also important because structural symmetric regions often relate to functional properties of a protein. However, this symmetry-function relationship has not been studied systematically in membrane proteins to date. To establish the role of symmetry in membrane protein function, we began by quantifying the types of symmetry and pseudo-symmetry described in the Encyclopedia of Membrane Proteins Analyzed by Structure and Symmetry (EncoMPASS) database. We found that identical subunits almost always assemble into symmetric membrane protein complexes, as might be expected, while over half of membrane protein structures contain a pseudo-symmetry within a subunit. In order to understand the functions associated with specific symmetry features, a symmetry enrichment analysis was performed using the Gene Ontology annotations for each structure. This study lays the groundwork for further investigation of the symmetry-function relationship in membrane proteins including comparison with a similar study of water-soluble proteins.

KinPhyCoRe: A web resource of protein kinase sequence, structure and phylogeny
COSI: 3DSIG COSI
  • Vivek Modi, Fox Chase Cancer Center, Philadelphia, United States
  • Roland Dunbrack, Fox Chase Cancer Center, United States

Short Abstract: The classification of protein kinase (PK) catalytic site conformations is crucial in understanding their dynamics and structure-based drug design. In our previous work, we clustered and labeled all the human PK structures based on the backbone and side chain dihedrals of the conserved DFGmotif. We also created a structurally validated multiple sequence alignment of 497 human kinase domains. We used it to derive a revised phylogenetic tree, which reclassified 10 PKs as CAMKs that were previously grouped as OTHER. Here, we present KinPhyCoRe (dunbrack.fccc.edu/kinphycore) – a webresource which provides access to the conformational assignments of all human PKs. The database can be browsed for all entries at once or searched by PDBid or gene, or a combination of kinase group name (e.g., AGC, CAMK) and conformational cluster (e.g., DFGin-BLAminus). The website design includes separate hyperlinked pages for kinase groups, their genes, structures, conformational clusters and ligands with representative structures displayed in 3D. Users can also upload a PK structure to the webserver which will identify its conformation. In addition, we also provide access to the alignment of human PKs and the phylogenetic tree. The entire database can be downloaded as text files, as can renumbered PDB files and Pymol sessions.

Local Structural Similarity of Mononucleotide Binding Sites in Different Levels of SCOPe Classification
COSI: 3DSIG COSI
  • Shota Kawakami, Graduate School of Life Sciences, Tohoku University, Japan
  • Hafumi Nishi, Graduate School of Infomation Sciences, Tohoku University, Japan
  • Kengo Kinoshita, Graduate School of Infomation Sciences, Tohoku University, Japan

Short Abstract: Examining the local structures of ligand-binding sites is important to elucidate the mechanisms of how proteins recognize ligands. Although the number of protein structural data registered in the PDB has been increasing year by year, few studies have performed an exhaustive comparison of ligand-binding site structures at the atomic level with the latest data.
In this study, we performed an all-against-all comparison of mononucleotide binding sites at the atomic level for two different datasets: earlier and updated. The earlier dataset includes the proteins registered by 2000, whereas the updated covers those registered by 2017.
The results clearly illustrated that proteins from the same superfamily had shown high similarity in terms of binding site structure, while proteins from different superfamilies had shown low similarity. It was also demonstrated that proteins from the same “Protein” level in SCOPe showed much higher structural similarity of binding sites than other levels. In both earlier and updated datasets, the local structural similarity of binding sites is confirmed to correspond to the global similarity of proteins. In addition, it was suggested that the degree of similarity in ligand-binding sites reflects the structural similarity level in SCOPe classification hierarchy.

Modeling of G protein-coupled receptor structures : Improving the prediction of loop conformations and the usability of models for structure-based drug design
COSI: 3DSIG COSI
  • Bhumika Arora, Indian Institute of Technology Bombay, Monash University, and IITB-Monash Research Academy, India

Short Abstract: G protein-coupled receptors (GPCRs) form the largest group of potential drug targets and therefore, the knowledge of their three-dimensional structure is important for rational drug design. Homology modeling serves as a common approach for modeling the transmembrane helical cores of GPCRs, however, these models have varying degrees of inaccuracies that result from the quality of template used. We have explored the extent to which inaccuracies inherent in homology models of the transmembrane helical cores of GPCRs can impact loop prediction. We found that loop prediction in GPCR models is much more difficult than loop reconstruction in crystal structures owing to the imprecise positioning of loop anchors. Therefore, minimizing the errors in loop anchors is likely to be critical for optimal GPCR structure prediction. To address this, we have developed a Ligand Directed Modeling (LDM) method comprising of geometric protein sampling and ligand docking. The method was evaluated for capacity to refine the GPCR models built across a range of templates with varying degrees of sequence similarity with the target. LDM reduced the errors in loop anchor positions and improved the prediction of binding poses of ligands, resulting in much better performance of these models in virtual ligand screenings.

Modelling Structural Rearrangements using Euclidean Distance Matrix Completion
COSI: 3DSIG COSI
  • Aleix Lafita-Masip, European Bioinformatics Institute EMBL-EBI, United Kingdom
  • Alex Bateman, European Bioinformatics Institute EMBL-EBI, United Kingdom

Short Abstract: Proteins undergo large-scale structural rearrangements, such as circular permutations, dimerisation via domain swapping, and loss of core secondary structure elements in domain atrophy, among others. Thanks to conserved native residue contacts at the protein core, these structural changes can be naturally represented as distance matrix transformations. Here we present an approach to formulate structural rearrangements as a Euclidean Distance Matrix Completion (EDMC) problem and use it to build their 3D models. EDMC modelling aims to be intuitive, flexible and fast. Models are solely based on protein geometry and if needed can be further used by other protein modelling tools to include energetic constraints. We demonstrate various applications of EDMC modelling in protein structure analysis and its integration into the TADOSS method (TAndem DOmain Swap Stability predictor: github.com/lafita/tadoss).

Nanocapsule Designs for Antimicrobial Resistance
COSI: 3DSIG COSI
  • Irene Marzuoli, King's College London, United Kingdom
  • Carlos Cruz, Instituto de Tecnologia Química e Biológica António Xavier (ITQB), Portugal
  • F Fraternali, Randall Division of Cell and Molecular Biophysics, King’s College London, United Kingdom

Short Abstract: Antimicrobial resistance and drug delivery have been main focuses of the recent medical research. Recently engineered virus-like nanocapsules derived from synthetic multi branched peptides have been shown to promote bacterial membrane poration and to be suitable for gene delivery at the same time [1].
The atomistic details of the nanocapsule assembly, necessary for the antimicrobial and gene delivery activities, are not accessible to experimental techniques. Therefore, the nanocapsule stability in water and its interaction with a model membrane was studied through Molecular Dynamics simulations, comparing the results with the available experimental data [2].
Integrated results from simulations at different resolutions highlighted the role of the amphiphilic structure of capzip as driven promoter of the assembly stability. Moreover, simulations highlighted a strong affinity with a bacterial model membrane and lower with a mammalian one. This results in bacterial membrane poration in presence of an electric field, a process triggered by the insertion of Arginine residues, which are abundant in the structure. This investigation shows the essential role of computational techniques in rationalizing the experimental results and suggests how to manipulate capzip composition in order to trigger particular functions.

1. Chem. Sci., 7(3):1707–1711, 2016.
2. ACS Nano, 14(2):1609-1622, 2020.

Nature of long-range evolutionary constraint in enzymes: Insights from comparison to non-catalytic ligand binding sites
COSI: 3DSIG COSI
  • Avital Sharir-Ivry, McGill University, Israel
  • Yu Xia, McGill University, Canada

Short Abstract: Quantitative evolutionary design principles of enzymes remain elusive on the proteomic scale. Recent studies have uncovered a remarkably long-range evolutionary constraint in enzymes structure in which site-specific evolutionary rate increases with distance from the catalytic site affecting distant sites. Counterpart pseudoenzymes that share the same protein fold but are catalytically inactive exhibit a significantly reduced conservation gradient, showing that the three-dimensional structure of the enzyme does not dictate its unique long-range constraint. Searching for the origin of the evolutionary constraint we systematically studied the magnitude of conservation gradients induced by different types of functional sites in enzymes and other proteins: catalytic sites, non-catalytic ligand binding sites, allosteric binding sites, and protein-protein interaction sites. We show that catalytic sites induce significantly stronger conservation gradients than all other types of non-catalytic binding sites. Notably, the weak conservation gradient induced by non-catalytic binding sites in enzymes is nearly identical in magnitude to those induced by ligand binding sites in non-enzymes. Our results show that the unique constraint from catalytic sites in enzymes is likely driven by the optimization of catalysis rather than ligand binding and allosteric functions. These results shed light on the structural and functional determinants of enzyme evolution.

Network analysis of synonymous codon usage
COSI: 3DSIG COSI
  • Khalique Newaz, University of Notre Dame, United States
  • Gabriel Wright, University of Notre Dame, United States
  • Jacob Piland, University of Notre Dame, United States
  • Jun Li, University of Notre Dame, United States
  • Patricia Clark, University of Notre Dame, United States
  • Scott Emrich, University of Tennessee, United States
  • Tijana Milenkovic, University of Notre Dame, United States

Short Abstract: Most amino acids are encoded by multiple codons, some of which are used more rarely than others. Analyses of positions of such rare codons in protein sequences revealed that rare codons can impact protein folding and that positions of some rare codons are evolutionarily conserved. Analyses of their positions in protein 3-dimensional structures, which are biochemically richer than sequences alone, might further explain the role of rare codons in protein folding. We model protein structures as networks and use network centrality to measure the structural position of an amino acid. We first validate that amino acids buried within the protein’s core are network-central, and those on the surface are not. Only then, we study potential differences between network and thus structural positions of amino acids encoded by evolutionarily conserved rare, evolutionarily non-conserved rare, and commonly used codons. In 84% of our proteins, the three codon categories occupy significantly different structural positions. We examine protein groups showing different relationships between structural positions of the three codon categories. Several of the groups show interesting structural or functional characteristics. Our work provides evidence that codon usage is linked to the final protein 3D structure and thus potentially to co-translational protein folding.

Novel in-silico skin sensitization prediction model for cosmetic ingredients by analysing their affinity towards CD54/CD86 receptors
COSI: 3DSIG COSI
  • Sarra Akermi, annotation analytics pvt. ltd., India
  • Sunil Jayant, annotation analytics pvt. ltd., India

Short Abstract: The human Cell Line Activation Test (h-CLAT) is OECD adopted cell-based assay that contributes to the assessment of skin sensitisation potential of skin care ingredients. The method addresses KE3 of skin sensitisation Adverse Outcome Pathway (AOP) by quantifying expressional alteration of cell surface receptors (CD54 and CD86) associated with activation of monocytes and dendritic cells (DC). To achieve significant reduction of time & cost for large scale cosmetic screening, an in-silico skin sensitization prediction model assesses interaction b/w cosmetics compounds and CD54 & CD86 receptors. As per h-CLAT OECD guidelines 442E, four-positives (++) and four-negatives (-) chemicals with known skin sensitization effect selected to test concept. Eight compounds screened against CD54 and CD86 using molecular docking. Consequently, docking results revealed that positive cosmetic ingredients i.e., Imidazolidinyl urea, Hydroxycitronellal, Chloramine T and 2,4-Dinitrochlorobenzene (DNCB) predicted stronger-affinities and, negative skin sensitizers, Glycerol, Lactic acid, 1-Butanol and Vanillin showed lesser-affinities against CD54 and CD86 as expected. Our computational results present the proof of concept of method in agreement to OECD guidelines 442E. This concludes that the proposed in-silico method offers great alternative or complement to experimental pre-screening & skin sensitization potency evaluation of cosmetic compounds.

Predicting changes in protein thermostability upon point mutation with deep 3D convolutional neural networks
COSI: 3DSIG COSI
  • Bian Li, Yale University, United States
  • Yucheng Yang, Yale University, United States
  • John Capra, Vanderbilt University, United States
  • Mark Gerstein, Yale University, United States

Short Abstract: Predicting mutation-induced changes in protein thermostability (ΔΔG) is of great interest in protein engineering, variant interpretation, and drug discovery. We introduce ThermoNet, a deep 3D-convolutional neural network designed for structure-based prediction of ΔΔG upon point mutation. To naturally leverage the image-processing power inherent in convolutional neural networks, we treat protein structures as if they were multi-channel 3D images. In particular, the inputs to ThermoNet are multi-channel voxel grids based on biophysical properties derived from raw atom coordinates. ThermoNet is trained with a data set balanced with direct and reverse mutations generated by symmetry-based data augmentation. It demonstrates improved performance compared to fifteen previously developed computational methods on a widely used blind test set. Unlike all other methods that exhibit a strong bias towards predicting destabilization, ThermoNet accurately predicts the effects of both stabilizing and destabilizing mutations. Finally, we demonstrate the practical utility of ThermoNet in predicting the ΔΔG landscape for two clinically relevant proteins, p53 and myoglobin, and ClinVar missense variants. Overall, our results suggest that 3D convolutional neural networks can model the complex, non-linear interactions perturbed by mutations, directly from biophysical properties of atoms.

Prediction and Characterization of Disorder-Order Transition Regions in Proteins by Deep Learning
COSI: 3DSIG COSI
  • Ziang Yan, Tohoku University, Japan
  • Satoshi Omori, Tohoku University, Japan
  • Kazunori Yamada, Tohoku University, Japan
  • Hafumi Nishi, Tohoku University, Japan
  • Kengo Kinoshita, Tohoku University, Japan

Short Abstract: Motivation: Many experimental evidences have shown that protein disordered regions have crucial biological roles. In some of these regions, disorder-order transitions are also involved in various biological processes, such as protein-protein interaction and ligand binding. Owing to the costs and time requirements of experimental identification of natively disordered or transitional regions, the development of effective computational methods is a key research goal. In this study, we used overall residue dependencies and representation learning for prediction and reused the obtained disordered information for the prediction of disorder–order transitions.
Results: We developed a novel deep learning method, Res-BiLstm, for residue-wise disordered region prediction. Our method outperformed other predictors with respect to almost all criteria, as evaluated using an independent test set. For disorder-order transition prediction, we proposed a transfer learning method, Res-BiLstm-NN, with an acceptable but unbalanced performance, yielding reasonable results. To grasp underlining biophysical principles of disorder-order transitions, we performed qualitative analyses on the obtained results and discovered that most transitions have strong disordered or ordered preferences, and more transitions are consistent with the ordered state than the disordered state, different from conventional wisdom. To the best of our knowledge, this is the first sizable-scale study of transition prediction.

ProtCID: A data resource for structural information on protein interactions
COSI: 3DSIG COSI
  • Qifang Xu, Fox Chase Cancer Center, United States
  • Roland Dunbrack, Fox Chase Cancer Center, United States

Short Abstract: Structural information on the interactions of proteins with other molecules is plentiful, and for some proteins and protein families, there may be 100s or even 1000s of available structures. It can be very difficult for a scientist who is not trained in structural bioinformatics to access this information comprehensively. Previously, we developed the Protein Common Interface Database (ProtCID), which provided clusters of the interfaces of full-length protein chains as a means of verifying or suggesting biological assemblies, which differ from crystallographic asymmetric units about 40% of the time. Because proteins consist of domains that act as modular functional units which are often recombined in different genes, we have extended the analysis in ProtCID to the individual domain level. This has greatly increased the number of large protein-protein clusters in ProtCID, enabling the generation of hypotheses on the structures of biological assemblies of many systems. The analysis of domain families allows us to extend ProtCID to the interactions of domains with peptides, nucleic acids, and ligands. ProtCID provides complete annotations and coordinate sets for every cluster.

Protein local conformations analyses in ordered and intrinsically disordered proteins in the light of a structural alphabet
COSI: 3DSIG COSI
  • Alexandre G. De Brevern, Université de Paris - INSERM UMR-S 1134 - INTS - DSIMB Team, France

Short Abstract: Protein structures are highly dynamic macromolecules. Molecular dynamics (MDs) simulations were performed on 169 representative protein domains. Classical secondary structures were explored. Concerning the helical structures, only 76.4% of the residues associated to α-helices retain the conformation; this tendency dropped to 40% for 310- and for π-helices (Narwani et al, Arch Biol Sci, 2018). The rigidity of β-sheet was confirmed, but showed its capacity to transform into turns. Finally, turns converted easily to helical structures while bends prefer the extended conformations. Protein Blocks structural alphabet (PBs, de Brevern et al, Proteins, 2000) showed that the majority of PBs remains with high frequency in original conformation. Few PBs have a higher tendency to be more flexible. The intriguing fact was that the change from a PB to another one did not correspond to a simple geometrical evolution. It was more frequent to go to an unexpected PB than an expected one (Narwani et al, J Biomol Struct Dyn, 2019). Disorder protein ensembles were analysed with PBs allowing to quantify the continuum from rigidity to flexibility and finally disorder (Melarkode Vattekatte et al, J Struct Biol, 2020, Data in Brief, 2020). These results have been compared to different types of prediction.

QDeep: distance-based protein model quality estimation by residue-level ensemble error classifications using stacked deep residual neural networks
COSI: 3DSIG COSI
  • Md Hossain Shuvo, Auburn University, United States
  • Debswapna Bhattacharya, Auburn University, United States
  • Sutanu Bhattacharya, Auburn University, United States

Short Abstract: Protein model quality estimation, in many ways, informs protein structure prediction. Despite their tight coupling, existing model quality estimation methods do not leverage inter-residue distance information or the latest technological breakthrough in deep learning that has recently revolutionized protein structure prediction. We present a new distance-based single-model quality estimation method called QDeep by harnessing the power of stacked deep residual neural networks (ResNets). Our method first employs stacked deep ResNets to perform residue-level ensemble error classifications at multiple predefined error thresholds and then combines the predictions from the individual error classifiers for estimating the quality of a protein structural model. Experimental results show that our method consistently outperforms existing state-of-the-art methods including ProQ2, ProQ3, ProQ3D, ProQ4, 3DCNN, MESHI, and VoroMQA in multiple independent test datasets across a wide range of accuracy measures; and that predicted distance information significantly contributes to the improved performance of QDeep.

Quality Assessment of Protein Docking Models Based on Graph Neural Network
COSI: 3DSIG COSI
  • Ye Han, Jilin Agricultrual University, China
  • Fei He, University of Missouri-Columbia, United States
  • Dong Xu, Univ. of Missouri-Columbia, United States

Short Abstract: Protein docking provides a structural basis for the design of drugs and vaccines. Among the processes of protein docking, quality assessment (QA) is utilized to pick near-native models from numerous protein docking candidate conformations, which directly determines the final docking results. Although extensive efforts have been put to improve QA accuracy, it is still the bottleneck of current protein docking systems. In this paper, we presented a Deep Graph Attention Neural Network (DGANN) to evaluate and rank protein docking candidate models. DGANN learns inter-residue physio-chemical properties and structural fitness across the two protein monomers in a docking model and generates their probabilities of near-native models. On the ZDOCK decoy benchmark, our DGANN outperformed the ranking provided by ZDOCK in terms of ranking good models into the top selections.

RCSB PDB Next-generation Data Delivery and Search Services
COSI: 3DSIG COSI
  • Jose Manuel Duarte, RCSB Protein Data Bank, UC San Diego, United States
  • Charmi Bhikadiya, RCSB Protein Data Bank, Rutgers University, United States
  • Chunxiao Bi, RCSB Protein Data Bank, UC San Diego, United States
  • Sebastian Bittrich, RCSB Protein Data Bank, UC San Diego, United States
  • Li Chen, RCSB Protein Data Bank, Rutgers University, United States
  • Dmytro Guzenko, RCSB Protein Data Bank, UC San Diego, United States
  • Robert Lowe, RCSB Protein Data Bank, Rutgers University, United States
  • Joan Segura, RCSB Protein Data Bank, UC San Diego, United States
  • Yana Valasatava, RCSB Protein Data Bank, UC San Diego, United States
  • John D. Westbrook, RCSB Protein Data Bank, Rutgers University, United States
  • Stephen K. Burley, RCSB Protein Data Bank, Rutgers University, United States

Short Abstract: RCSB Protein Data Bank (PDB) provides tools for analysis and visualization of 3D structures of biological macromolecules stored in the PDB archive. Recently-introduced Search and Data Delivery APIs offer comprehensive functionality and high performance at RCSB.org. The new services represent a complete overhaul of the software/data management architecture, transforming a monolithic application into a micro-service-oriented and cloud-ready resource. The data model is based on the PDBx/mmCIF dictionary (mmcif.wwpdb.org/) with extensions that facilitate usage and delivery for the RCSB PDB website and web services.

For Data delivery (data.rcsb.org), a GraphQL interface allows arbitrary retrieval of data across the entire data model. To the best of our knowledge, this represents a first in Structural Bioinformatics.

Search services (search.rcsb.org) are supported by a powerful Search API with a JSON-based Domain Specific Language (DSL). Arbitrary boolean logic search is now possible across all fields available in our data model. Importantly, a search aggregator layer seamlessly combines text searches from the Elasticsearch engine with specialized bioinformatics algorithms that perform searches against macromolecular sequence and/or atomic coordinate data. Examples of the searches integrated by the aggregator are mmseqs2 sequence search, BioZernike structure shape search, and sequence motif search.

Redundancy-Weighting the PDB for Detailed Secondary Structure Prediction
COSI: 3DSIG COSI
  • Tomer Sidi, Ben-Gurion University of the Negev, Israel
  • Chen Keasar, Ben Gurion University of the Negev, Israel

Short Abstract: The Protein Data Bank (PDB), the ultimate source for data in structural biology, is inherently imbalanced. To alleviate biases, virtually all structural biology studies use non-redundant subsets of the PDB, which include only a fraction of the available data. An alternative approach, dubbed redundancy-weighting, down-weights redundant entries rather than discarding them. This approach may be particularly helpful for Machine Learning (ML) methods that use the PDB as their source for data.

Current state-of-art methods for Secondary Structure Prediction of proteins (SSP) use non-redundant datasets to predict either 3-letter or 8-letter secondary structure annotations. The current study challenges both choices: the dataset and alphabet size. Non-redundant datasets are presumably unbiased, but are also inherently small, which limits the machine learning performance. On the other hand, the utility of both 3- and 8-letter alphabets is limited by the aggregation of parallel, anti-parallel, and mixed beta-sheets in a single class. Each of these subclasses imposes different structural constraints, which makes the distinction between them desirable. In this study we show improvement in prediction accuracy by training on a redundancy-weighted dataset. Further, we show the information content is improved by extending the alphabet to consider beta subclasses while hardly effecting SSP accuracy.

SARS-CoV-2 spike protein predicted to bind strongly to host receptor protein orthologues from mammals, but not fish, birds or reptiles
COSI: 3DSIG COSI
  • Nl Dawson, Institute of Structural and Molecular Biology, University College London, United Kingdom
  • Christine Orengo, Institute of Structural and Molecular Biology, University College London, United Kingdom
  • Jm Santini, Institute of Structural and Molecular Biology, University College London, United Kingdom
  • Jg Lees, Oxford Brookes University, United Kingdom
  • F Fraternali, Randall Division of Cell and Molecular Biophysics, King’s College London, United Kingdom
  • Sjl Edwards, University College London, United Kingdom
  • I Sillitoe, Institute of Structural and Molecular Biology, University College London, United Kingdom
  • M Abbasian, Institute of Structural and Molecular Biology, University College London, United Kingdom
  • Csm Pang, Institute of Structural and Molecular Biology, University College London, United Kingdom
  • Sd Lam, Faculty of Science and Technology, Universiti Kebangsaan Malaysia, Malaysia
  • C Rauer, Institute of Structural and Molecular Biology, University College London, United Kingdom
  • L van Dorp, University College London, United Kingdom
  • N Sen, Indian Institute of Science Education and Research, Pune, 411008, India
  • P Ashford, Institute of Structural and Molecular Biology, University College London, United Kingdom
  • Hm Scholes, Institute of Structural and Molecular Biology, University College London, United Kingdom
  • Vp Waman, Institute of Structural and Molecular Biology, University College London, United Kingdom
  • N Bordin, Institute of Structural and Molecular Biology, University College London, United Kingdom

Short Abstract: The coronavirus disease 2019 (COVID-19) global pandemic is caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). SARS-CoV-2 has zoonotic origin and transmitted to humans via an undetermined intermediate host, leading to widespread infections in human and reported infections in other mammals. To enter host cells, viral spike protein binds to angiotensin-converting enzyme 2 (ACE2), and is processed by transmembrane protease serine 2 (TMPRSS2). Whilst receptor binding contributes to the viral host-range, changes in energy of the spike protein:ACE2 complex in orthologues from other animals have not been widely explored. Here, we analyse interactions between spike protein and orthologues of ACE2 and TMPRSS2 from 215 vertebrate species. We predicted structures for these orthologues, used structures of the spike protein:ACE2 complex to calculate changes in the energy of the complex and correlated these to COVID-19 severities in mammals. Across vertebrate orthologues, mutations are predicted to be more disruptive to the structure of ACE2 than TMPRSS2. Finally, we provide phylogenetic evidence that SARS-CoV-2 has recently transmitted from humans to animals. Our results suggest SARS-CoV-2 can infect a broad range of mammals––but not fish, birds or reptiles––which could serve as reservoirs of the virus, necessitating careful ongoing animal management and surveillance.

Structural Basis and Key Networks of the SARS-CoV2 Main Protease Active Site
COSI: 3DSIG COSI
  • Navaneethakrishnan Krishnamoorthy, Sidra Medicine & Imperial College London, Qatar

Short Abstract: The current pandemic condition is caused by the outbreak of a novel coronavirus SARS-CoV2 (CoV2, also called COVID-19) and this viral genome is highly similar (88%) to SARS-CoV (CoV) that emerged in 2003. They initiate and regulate the infection in the host mostly by their molecular machine called main protease (Mpro). Although the X-ray 3D structures of CoV Mpro and CoV2 Mpro are similar there are 12 variable residues between them. This study compared these 3D structures for understanding the interaction networks of the 12 key residues at their active site regions. Compared to CoV the CoV2 indirectly (via neighbours) reshaping the key intra-molecular networks at the active site in the Mpro, specifically at the entrance and near the catalytic region. The CoV2 Mpro showing that it is structurally identical to CoV Mpro while it modifies the key positions by mass mutations (12 variable residues). Hypothetically, the modified networks suggest that they may account for some of the intelligent molecular machinery used in this new viral system. This basic study provides a few valuable structural networks of the Mpro CoV2 at the inhibitor pocket, which may be considered while designing efficient anti-CoV2 Mpro drugs.

Studying de novo mutations via structural alterations in protein-protein interaction: STXBP1 associated neuronal pathology
COSI: 3DSIG COSI
  • Ehud Banne, Kaplan Medical Center, Rehovot, Israel
  • Esther Brielle, The Hebrew University of Jerusalem, Israel
  • Danielle Klinger, The Hebrew University of Jerusalem, Israel
  • Dina Schneidman-Duhovny, The Hebrew University of Jerusalem, Israel
  • Michal Linial, The Hebrew University of Jerusalem, Israel

Short Abstract: A large fraction of childhood epilepsy, developmental delays and neurodevelopmental diseases (NDD) is attributed to de novo mutations including missense and in-frame indels. Often, despite a detailed genetic, no explanation exists for the manifestation of the disease or its. In this study, we benefit from 3D structural data of proteins complexes to assess the impact of specific mutations on the protein-protein interactions (PPI). We focused on STXBP1 (also known as Munc-18), a master regulator of synaptic function and neurotransmitter release. Many de novo STXBP1 mutations lead to epilepsy and diverse forms of NDD. We applied structural modeling and molecular dynamics (MD) simulations to quantify the stability and properties of the STXBP1 interaction with syntaxin 1A. We show that while state-of-the-art variant prediction tools resulted in discordant interpretation, we could assess mutations by their pathological severity that match the calculated properties of the STXBP1-syntaxin 1A interface. Mutations that cause a reduced interaction surface area of STXBP1-syntaxin 1A led to the destabilization of the protein complex and eventually a disruption in synaptic transmission. This study provides a direct approach that connects novel variants with 3D structure and dynamics. The method is extended to protein complexes associated with other clinical rare diseases.

The ResiRole Server to Enable Assessments of Structure Prediction Techniques Using Functional Site Predictions
COSI: 3DSIG COSI
  • Joshua Toth, Geisinger Commonwealth School of Medicine, United States
  • Paul DePietro, Geisinger Commonwealth School of Medicine, United States
  • Juergen Haas, SIB Swiss Institute of Bioinformatics, Switzerland
  • William McLaughlin, Geisinger Commonwealth School of Medicin, United States

Short Abstract: The Continuous Automated Model Evaluation (CAMEO) platform presents the results of protein structure predictions generated by the hosted structure prediction servers for pre-release sequences from the Protein Data Bank. Here we describe the ResiRole server, protein.som.geisinger.edu/ResiRole/, for the assessment of the structure models available through CAMEO regarding their abilities to have SeqFEATURE functional site predictions like those at the corresponding sites in the reference structures. The results are presented as average difference scores per structure prediction technique and per structure model, where each difference score is defined as the absolute difference in the cumulative probability of the functional site prediction in the reference structure versus that at the corresponding site in the structure model. Results are accessible according to target difficulty based on lDDT score ranges. The difference score is compared to other metrics for estimating structure model quality which use the reference structures for their basis and is found to be a complimentary quality metric. For example, when using the difference score as the benchmark, we find that the quality of structure models produced by NaïveBLAST is on average underestimated. The results indicate that NaïveBlast models may contain more information on local functional site predictions than previously estimated.

The vestibule role of membrane-water interface as the intermediate stage in a new three-stage model for helical membrane protein folding
COSI: 3DSIG COSI
  • Bridget Kawamala, CSUN, United States
  • Ravinder Abrol, California State University, Northridge, United States

Short Abstract: Transmembrane alpha-helical (TMH) proteins play critical roles in cellular signaling. They display a diversity of structural folds featuring almost-parallel orientation of TM helices packing into helical bundles. The membrane environment enormously reduces the accessible conformational landscape for folding, but also makes its experiments challenging. The contribution of helix insertion energies to the folding energy landscape was computed using structural bioinformatics based hydropathy analysis for most of the polytopic helical membrane proteome (from 1-TMH to 24-TMH proteins with structures). The magnitudes of TM helix insertion energies from Water to membrane-water Interface (WAT→INT energies) are on average half of those insertion energies from water to Trans-Membrane-Helix orientation (WAT→TMH energies), suggesting a potential vestibule role of the membrane-water interface for the TM helices after translocon exit. This is confirmed by showing the stability of very hydrophobic TM helices in the membrane-water interface through multiple microsecond long molecular dynamics simulations of a stop-transfer helix, a re-integration helix, and a pre-folded helical-hairpin from the ribosomal exit vestibule. So, a three-stage folding model is proposed to extend Popot-Engelman’s original two-stage model, where the membrane-water interface acts as the intermediate stage holding vestibule for translated TM helices, reconciling the interface’s critical role seen in many previous studies.